182 research outputs found

    True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4

    Full text link
    Large language models (LLMs) have demonstrated solid zero-shot reasoning capabilities, which is reflected in their performance on the current test tasks. This calls for a more challenging benchmark requiring highly advanced reasoning ability to be solved. In this paper, we introduce such a benchmark, consisting of 191 long-form (1200 words on average) mystery narratives constructed as detective puzzles. Puzzles are sourced from the "5 Minute Mystery" platform and include a multiple-choice question for evaluation. Only 47% of humans solve a puzzle successfully on average, while the best human solvers achieve over 80% success rate. We show that GPT-3 models barely outperform random on this benchmark (with 28% accuracy) while state-of-the-art GPT-4 solves only 38% of puzzles. This indicates that there is still a significant gap in the deep reasoning abilities of LLMs and humans and highlights the need for further research in this area. Our work introduces a challenging benchmark for future studies on reasoning in language models and contributes to a better understanding of the limits of LLMs' abilities.Comment: 5 pages, to appear at *SE

    Multi-Domain Neural Machine Translation

    Get PDF
    We present an approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating. The core idea is to treat text domains as distinct languages and use multilingual NMT methods to create multi-domain translation systems, we show that this approach results in significant translation quality gains over fine-tuning. We also explore whether the knowledge of pre-specified text domains is necessary, turns out that it is after all, but also that when it is not known quite high translation quality can be reached.Comment: Accepted to EAMT'2018, In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (EAMT'2018

    Voting and Stacking in Data-Driven Dependency Parsing

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 219-222. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    DNA Repair Proteins as Molecular Targets for Cancer Therapeutics

    Get PDF
    Cancer therapeutics include an ever-increasing array of tools at the disposal of clinicians in their treatment of this disease. However, cancer is a tough opponent in this battle and current treatments which typically include radiotherapy, chemotherapy and surgery are not often enough to rid the patient of his or her cancer. Cancer cells can become resistant to the treatments directed at them and overcoming this drug resistance is an important research focus. Additionally, increasing discussion and research is centering on targeted and individualized therapy. While a number of approaches have undergone intensive and close scrutiny as potential approaches to treat and kill cancer (signaling pathways, multidrug resistance, cell cycle checkpoints, anti-angiogenesis, etc.), much less work has focused on blocking the ability of a cancer cell to recognize and repair the damaged DNA which primarily results from the front line cancer treatments; chemotherapy and radiation. More recent studies on a number of DNA repair targets have produced proof-of-concept results showing that selective targeting of these DNA repair enzymes has the potential to enhance and augment the currently used chemotherapeutic agents and radiation as well as overcoming drug resistance. Some of the targets identified result in the development of effective single-agent anti-tumor molecules. While it is inherently convoluted to think that inhibiting DNA repair processes would be a likely approach to kill cancer cells, careful identification of specific DNA repair proteins is increasingly appearing to be a viable approach in the cancer therapeutic cache

    Mixing and blending syntactic and semantic dependencies

    Get PDF
    Our system for the CoNLL 2008 shared task uses a set of individual parsers, a set of stand-alone semantic role labellers, and a joint system for parsing and semantic role labelling, all blended together. The system achieved a macro averaged labelled F1- score of 79.79 (WSJ 80.92, Brown 70.49) for the overall task. The labelled attachment score for syntactic dependencies was 86.63 (WSJ 87.36, Brown 80.77) and the labelled F1-score for semantic dependencies was 72.94 (WSJ 74.47, Brown 60.18)

    Findings of the 2019 Conference on Machine Translation (WMT19)

    Get PDF
    This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
    corecore